rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

نویسندگان

James Psota

Anant Agarwal

چکیده

With multicore processors becoming the standard architecture, programmers are faced with the challenge of developing applications that capitalize on multicore’s advantages. This paper presents rMPI, which leverages the onchip networks of multicore processors to build a powerful abstraction with which many programmers are familiar: the MPI programming interface. To our knowledge, rMPI is the first MPI implementation for multicore processors that have on-chip networks. This study uses the MIT Raw processor as an experimentation and validation vehicle, although the findings presented are applicable to multicore processors with on-chip networks in general. Likewise, this study uses the MPI API as a general interface which allows parallel tasks to communicate, but the results shown in this paper are generally applicable to message passing communication. Overall, rMPI’s design constitutes the marriage of message passing communication and on-chip networks, allowing programmers to employ a wellunderstood programming model to a high performance multicore processor architecture. This work assesses the applicability of the MPI API to multicore processors with on-chip interconnect, and carefully analyzes overheads associated with common MPI operations. This paper contrasts MPI to lower-overhead network interface abstractions that the on-chip networks provide. The evaluation also compares rMPI to hand-coded applications running directly on one of the processor’s lowlevel on-chip networks, as well as to a commercial-quality MPI implementation running on a cluster of Ethernet-connected workstations. Results show speedups of 4x to 15x for 16 processor cores relative to one core, depending on the application, which equal or exceed performance scalability of the MPI cluster system. However, this paper ultimately argues that while MPI offers reasonable performance on multicores when, for instance, legacy applications must be run, its large overheads squander the multicore opportunity. Performance of multicores could be significantly improved by replacing MPI with a lighter-weight communications API with a smaller memory footprint.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Message Passing On Communication-Exposed Multi-Core Processors

Next-generationmicroprocessorswill increasingly rely onparallelism, as opposed to frequency scaling, for improvements in performance scalability. Microprocessor designers are attaining such parallelism by placing multiple processing cores on a single silicon die. Current commercial multi-core processors such as the POWER and AMD Opteron  force inter-processor communication to go through the...

متن کامل

Multicore OSes: Looking Forward from 1991, er, 2011

Upcoming multicore processors, with hundreds of cores or more in a single chip, require a degree of parallel scalability that is not currently available in today’s system software. Based on prior experience in the supercomputing sector, the likely trend for multicore processors is away from shared memory and toward sharednothing architectures based on message passing. In light of this, the ligh...

متن کامل

Towards RDF Query Processing on the Intel Single-Chip Cloud

Chip makers are envisioning hundreds of cores in future processors for throughput oriented computing. These processors, called manycore processors, require new architectural innovations for scaling to a large number of cores as compared with today’s multicore processors. We report an early study on the performance of RDF query processing on a manycore processor. In our study, we use the Intel S...

متن کامل

Performance of RDF Query Processing on the Intel SCC

متن کامل

CPHASH: A cache-partitioned hash table Citation

CPHASH is a concurrent hash table for multicore processors. CPHASH partitions its table across the caches of cores and uses message passing to transfer lookups/inserts to a partition. CPHASH’s message passing avoids the need for locks, pipelines batches of asynchronous messages, and packs multiple messages into a single cache line transfer. Experiments on a 80-core machine with 2 hardware threa...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

نویسندگان

چکیده

منابع مشابه

Message Passing On Communication-Exposed Multi-Core Processors

Multicore OSes: Looking Forward from 1991, er, 2011

Towards RDF Query Processing on the Intel Single-Chip Cloud

Performance of RDF Query Processing on the Intel SCC

CPHASH: A cache-partitioned hash table Citation

عنوان ژورنال:

اشتراک گذاری